Skip to content

Conversation

@angela-tarantula
Copy link
Contributor

@angela-tarantula angela-tarantula commented Oct 14, 2025

EDIT: This was my first attempt. Converting to EA Dtypes and back just to prevent lossy conversion is a discouraged approach not taken in this codebase, according to maintainers.

@angela-tarantula angela-tarantula changed the title WIP: add targeted casting to combine_first WIP: Fix precision of DataFrame.combine_first Oct 19, 2025
@angela-tarantula angela-tarantula changed the title WIP: Fix precision of DataFrame.combine_first WIP: Fix precision of DataFrame.combine and DataFrame.combine_first Oct 24, 2025
if y.name not in self.columns:
return y_values

return expressions.where(mask, y_values, x_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not using expressions.where have a perf impact?

(-1666880195890293744, "int64"),
),
)
def test_combine_first_preserve_precision(self, wide_val, dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you made changes to both combine and combine_first, but only a test for combine_first. can you test the fixed bug in combine?

"""Resolve the combined dtypes according to the original dtypes."""
cast_map: dict[IndexLabel, DtypeObj] = {}
for col in combined_df.columns:
ser = combined_df[col]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not going to be robust to non-unique columns. can you use .iloc[:, i] instead?

orig_dt_self = self_orig.dtypes.get(col)
orig_dt_other = other_orig.dtypes.get(col)

was_promoted = (orig_dt_self in [np.int64, np.uint64]) or (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no risk for smaller int widths?

2 NaN 3.0 1.0
"""

# GH#60128 Integers n where |n| > 2**53 would lose precision after align
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align only upcasts to float when the frames have different indexes right? would it be simpler to just warn users and tell them to do alignment before calling .combine?

@angela-tarantula
Copy link
Contributor Author

angela-tarantula commented Oct 30, 2025

My bad! I meant to close this draft PR.

@angela-tarantula
Copy link
Contributor Author

angela-tarantula commented Oct 30, 2025

This is not the right solution to the problem. A maintainer told me that casting to EA dtypes and back is an controversial approach and discouraged in this codebase.

I have already merged this PR which partially addresses bug report. The combiner did not previously preserve EA dtypes as it should, leaving users no way to avoid float64 lossy conversion even with EA dtypes. I'm currently thinking of a stopgap that can close the bug report without using EA dtypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Series.combine_first loss of precision

2 participants